An introduction to S3, Boto and Nexrad on S3

Adapted from and thank to the first tutorial by Valliappa Lakshmanan, formerly at Climate Corp now at Google.

https://eng.climate.com/2015/10/27/how-to-read-and-display-nexrad-on-aws-using-python/

Amazon Simple Storage Service (Amazon S3) is object storage with a simple web service interface to store and retrieve any amount of data from anywhere on the web. It is designed to deliver 99.999999999% durability, and scale past trillions of objects worldwide.

Boto is a Python package that provides interfaces to Amazon Web Services.


In [48]:
#Lets import some stuff!
import boto
from boto.s3.connection import S3Connection
from datetime import timedelta, datetime
import os
import pyart
from matplotlib import pyplot as plt
import tempfile
import numpy as np

%matplotlib inline

From https://aws.amazon.com/noaa-big-data/nexrad/ :

The NEXRAD Level II archive data is hosted in the “noaa-nexrad-level2” Amazon S3 bucket in S3’s US East region. The address for the public bucket is:

http://noaa-nexrad-level2.s3.amazonaws.com

https://noaa-nexrad-level2.s3.amazonaws.com

Each volume scan file is its own object in Amazon S3. The basic data format is the following:

/<Year>/<Month>/<Day>/<NEXRAD Station>/<filename>

Where:

is the year the data was collected

is the month of the year the data was collected

is the day of the month the data was collected

is the NEXRAD ground station (map of ground stations)

is the name of the file containing the data. These are compressed files (compressed with gzip). The file name has more precise timestamp information.

All files in the archive use the same compressed format (.gz). The data file names are, for example, KAKQ20010101_080138.gz. The file naming convention is:

GGGGYYYYMMDD_TTTTTT

Where:

GGGG = Ground station ID (map of ground stations) YYYY = year MM = month DD = day TTTTTT = time when data started to be collected (GMT)

Note that the 2015 files have an additional field on the file name. It adds “_V06” to the end of the file name. An example is KABX20150303_001050_V06.gz.


In [8]:
#first lets connect to the bucket
conn = S3Connection(anon = True)
bucket = conn.get_bucket('noaa-nexrad-level2')

In [10]:
#as we can see there is a LOT we can do with a bucket!!!
dir(bucket)


Out[10]:
['BucketPaymentBody',
 'LoggingGroup',
 'MFADeleteRE',
 'VersionRE',
 'VersioningBody',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__iter__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_delete_key_internal',
 '_get_all',
 '_get_all_query_args',
 '_get_key_internal',
 'add_email_grant',
 'add_user_grant',
 'cancel_multipart_upload',
 'complete_multipart_upload',
 'configure_lifecycle',
 'configure_versioning',
 'configure_website',
 'connection',
 'copy_key',
 'delete',
 'delete_cors',
 'delete_key',
 'delete_keys',
 'delete_lifecycle_configuration',
 'delete_policy',
 'delete_tags',
 'delete_website_configuration',
 'disable_logging',
 'enable_logging',
 'endElement',
 'generate_url',
 'get_acl',
 'get_all_keys',
 'get_all_multipart_uploads',
 'get_all_versions',
 'get_cors',
 'get_cors_xml',
 'get_key',
 'get_lifecycle_config',
 'get_location',
 'get_logging_status',
 'get_policy',
 'get_request_payment',
 'get_subresource',
 'get_tags',
 'get_versioning_status',
 'get_website_configuration',
 'get_website_configuration_obj',
 'get_website_configuration_with_xml',
 'get_website_configuration_xml',
 'get_website_endpoint',
 'get_xml_acl',
 'get_xml_tags',
 'initiate_multipart_upload',
 'key_class',
 'list',
 'list_grants',
 'list_multipart_uploads',
 'list_versions',
 'lookup',
 'make_public',
 'name',
 'new_key',
 'set_acl',
 'set_as_logging_target',
 'set_canned_acl',
 'set_cors',
 'set_cors_xml',
 'set_key_class',
 'set_policy',
 'set_request_payment',
 'set_subresource',
 'set_tags',
 'set_website_configuration',
 'set_website_configuration_xml',
 'set_xml_acl',
 'set_xml_logging',
 'set_xml_tags',
 'startElement',
 'validate_get_all_versions_params',
 'validate_kwarg_names']

The contents of the bucket are in bucket.list


In [12]:
my_list = bucket.list()
help(my_list)


Help on BucketListResultSet in module boto.s3.bucketlistresultset object:

class BucketListResultSet(builtins.object)
 |  A resultset for listing keys within a bucket.  Uses the bucket_lister
 |  generator function and implements the iterator interface.  This
 |  transparently handles the results paging from S3 so even if you have
 |  many thousands of keys within the bucket you can iterate over all
 |  keys in a reasonably efficient manner.
 |  
 |  Methods defined here:
 |  
 |  __init__(self, bucket=None, prefix='', delimiter='', marker='', headers=None, encoding_type=None)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  __iter__(self)
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)

We can see this is an iterator.. Printing the whole list would be YUUUUGE! so we want to subset it.. we can do this via the prefix keyword. We are then going to cast it to a list


In [15]:
my_pref = '2011/05/20/KVNX/'
bucket_list = list(bucket.list(prefix = my_pref))

In [20]:
print(bucket_list[0:10])


[<Key: noaa-nexrad-level2,2011/05/20/KVNX/KVNX20110520_000023_V06.gz>, <Key: noaa-nexrad-level2,2011/05/20/KVNX/KVNX20110520_000442_V06.gz>, <Key: noaa-nexrad-level2,2011/05/20/KVNX/KVNX20110520_000901_V06.gz>, <Key: noaa-nexrad-level2,2011/05/20/KVNX/KVNX20110520_001320_V06.gz>, <Key: noaa-nexrad-level2,2011/05/20/KVNX/KVNX20110520_001740_V06.gz>, <Key: noaa-nexrad-level2,2011/05/20/KVNX/KVNX20110520_002201_V06.gz>, <Key: noaa-nexrad-level2,2011/05/20/KVNX/KVNX20110520_002620_V06.gz>, <Key: noaa-nexrad-level2,2011/05/20/KVNX/KVNX20110520_003040_V06.gz>, <Key: noaa-nexrad-level2,2011/05/20/KVNX/KVNX20110520_003459_V06.gz>, <Key: noaa-nexrad-level2,2011/05/20/KVNX/KVNX20110520_003918_V06.gz>]

So we have a list of key (objects) in an S3 bucket. We can directly access the item and download it to a file using the contents_to_file method


In [34]:
home_dir = os.path.expanduser('~')
bucket_list[0].get_contents_to_filename(os.path.join(home_dir,'nexrad_tempfile'))

OK!! That was easy.. lets just take a quick look


In [36]:
radar = pyart.io.read(os.path.join(home_dir,'nexrad_tempfile'))

In [37]:
print(radar.info())


altitude:
	data: <ndarray of type: float64 and shape: (1,)>
	positive: up
	long_name: Altitude
	standard_name: Altitude
	units: meters
altitude_agl: None
antenna_transition: None
azimuth:
	data: <ndarray of type: float64 and shape: (8280,)>
	comment: Azimuth of antenna relative to true north
	axis: radial_azimuth_coordinate
	long_name: azimuth_angle_from_true_north
	standard_name: beam_azimuth_angle
	units: degrees
elevation:
	data: <ndarray of type: float32 and shape: (8280,)>
	comment: Elevation of antenna relative to the horizontal plane
	axis: radial_elevation_coordinate
	long_name: elevation_angle_from_horizontal_plane
	standard_name: beam_elevation_angle
	units: degrees
fields:
	reflectivity:
		data: <ndarray of type: float32 and shape: (8280, 1832)>
		valid_min: -32.0
		_FillValue: -9999.0
		coordinates: elevation azimuth range
		valid_max: 94.5
		long_name: Reflectivity
		standard_name: equivalent_reflectivity_factor
		units: dBZ
	differential_reflectivity:
		data: <ndarray of type: float32 and shape: (8280, 1832)>
		valid_min: -7.875
		_FillValue: -9999.0
		coordinates: elevation azimuth range
		valid_max: 7.9375
		long_name: log_differential_reflectivity_hv
		standard_name: log_differential_reflectivity_hv
		units: dB
	velocity:
		data: <ndarray of type: float32 and shape: (8280, 1832)>
		valid_min: -95.0
		_FillValue: -9999.0
		coordinates: elevation azimuth range
		valid_max: 95.0
		long_name: Mean doppler Velocity
		standard_name: radial_velocity_of_scatterers_away_from_instrument
		units: meters_per_second
	spectrum_width:
		data: <ndarray of type: float32 and shape: (8280, 1832)>
		valid_min: -63.5
		_FillValue: -9999.0
		coordinates: elevation azimuth range
		valid_max: 63.0
		long_name: Spectrum Width
		standard_name: doppler_spectrum_width
		units: meters_per_second
	cross_correlation_ratio:
		data: <ndarray of type: float32 and shape: (8280, 1832)>
		valid_min: 0.0
		_FillValue: -9999.0
		coordinates: elevation azimuth range
		valid_max: 1.0
		long_name: Cross correlation_ratio (RHOHV)
		standard_name: cross_correlation_ratio_hv
		units: ratio
	differential_phase:
		data: <ndarray of type: float32 and shape: (8280, 1832)>
		valid_min: 0.0
		_FillValue: -9999.0
		coordinates: elevation azimuth range
		valid_max: 360.0
		long_name: differential_phase_hv
		standard_name: differential_phase_hv
		units: degrees
fixed_angle:
	data: <ndarray of type: float32 and shape: (17,)>
	long_name: Target angle for sweep
	standard_name: target_fixed_angle
	units: degrees
instrument_parameters:
	nyquist_velocity:
		data: <ndarray of type: float32 and shape: (8280,)>
		comments: Unambiguous velocity
		long_name: Nyquist velocity
		meta_group: instrument_parameters
		units: meters_per_second
	unambiguous_range:
		data: <ndarray of type: float32 and shape: (8280,)>
		comments: Unambiguous range
		long_name: Unambiguous range
		meta_group: instrument_parameters
		units: meters
latitude:
	data: <ndarray of type: float64 and shape: (1,)>
	long_name: Latitude
	standard_name: Latitude
	units: degrees_north
longitude:
	data: <ndarray of type: float64 and shape: (1,)>
	long_name: Longitude
	standard_name: Longitude
	units: degrees_east
nsweeps: 17
ngates: 1832
nrays: 8280
radar_calibration: None
range:
	data: <ndarray of type: float32 and shape: (1832,)>
	comment: Coordinate variable for range. Range to center of each bin.
	meters_to_center_of_first_gate: 2125.0
	axis: radial_range_coordinate
	spacing_is_constant: true
	meters_between_gates: 250.0
	long_name: range_to_measurement_volume
	standard_name: projection_range_coordinate
	units: meters
scan_rate: None
scan_type: ppi
sweep_end_ray_index:
	data: <ndarray of type: int32 and shape: (17,)>
	long_name: Index of last ray in sweep, 0-based
	units: count
sweep_mode:
	data: <ndarray of type: |S20 and shape: (17,)>
	comment: Options are: "sector", "coplane", "rhi", "vertical_pointing", "idle", "azimuth_surveillance", "elevation_surveillance", "sunscan", "pointing", "manual_ppi", "manual_rhi"
	long_name: Sweep mode
	standard_name: sweep_mode
	units: unitless
sweep_number:
	data: <ndarray of type: int32 and shape: (17,)>
	long_name: Sweep number
	standard_name: sweep_number
	units: count
sweep_start_ray_index:
	data: <ndarray of type: int32 and shape: (17,)>
	long_name: Index of first ray in sweep, 0-based
	units: count
target_scan_rate: None
time:
	data: <ndarray of type: float64 and shape: (8280,)>
	comment: Coordinate variable for time. Time at the center of each ray, in fractional seconds since the global variable time_coverage_start
	calendar: gregorian
	long_name: time_in_seconds_since_volume_start
	standard_name: time
	units: seconds since 2011-05-20T00:00:23Z
metadata:
	history: 
	vcp_pattern: 12
	source: 
	title: 
	version: 1.3
	institution: 
	Conventions: CF/Radial instrument_parameters
	original_container: NEXRAD Level II
	comment: 
	instrument_name: 
	references: 
None

In [40]:
my_figure = plt.figure(figsize = [10,8])
my_display = pyart.graph.RadarDisplay(radar)
my_display.plot_ppi('reflectivity', 0, vmin = -12, vmax = 64)


Ok! How do I search for the volume I want? And make it open easily in Py-ART? Here is a little documented script


In [42]:
#Helper function for the search
def _nearestDate(dates, pivot):
    return min(dates, key=lambda x: abs(x - pivot))


def get_radar_from_aws(site, datetime_t):
    """
    Get the closest volume of NEXRAD data to a particular datetime.
    Parameters
    ----------
    site : string
        four letter radar designation
    datetime_t : datetime
        desired date time
    Returns
    -------
    radar : Py-ART Radar Object
        Radar closest to the queried datetime
    """

    #First create the query string for the bucket knowing
    #how NOAA and AWS store the data

    my_pref = datetime_t.strftime('%Y/%m/%d/') + site

    #Connect to the bucket

    conn = S3Connection(anon = True)
    bucket = conn.get_bucket('noaa-nexrad-level2')

    #Get a list of files

    bucket_list = list(bucket.list(prefix = my_pref))

    #we are going to create a list of keys and datetimes to allow easy searching

    keys = []
    datetimes = []

    #populate the list

    for i in range(len(bucket_list)):
        this_str = str(bucket_list[i].key)
        if 'gz' in this_str:
            endme = this_str[-22:-4]
            fmt = '%Y%m%d_%H%M%S_V0'
            dt = datetime.strptime(endme, fmt)
            datetimes.append(dt)
            keys.append(bucket_list[i])

        if this_str[-3::] == 'V06':
            endme = this_str[-19::]
            fmt = '%Y%m%d_%H%M%S_V06'
            dt = datetime.strptime(endme, fmt)
            datetimes.append(dt)
            keys.append(bucket_list[i])

    #find the closest available radar to your datetime

    closest_datetime = _nearestDate(datetimes, datetime_t)
    index = datetimes.index(closest_datetime)

    localfile = tempfile.NamedTemporaryFile()
    keys[index].get_contents_to_filename(localfile.name)
    radar = pyart.io.read(localfile.name)
    return radar

Lets take it for a spin!


In [50]:
base_date = "20161006_192700"
fmt = '%Y%m%d_%H%M%S' 
b_d = datetime.strptime(base_date, fmt)

my_radar = get_radar_from_aws('KAMX',b_d )
max_lat = 27
min_lat = 24
min_lon = -81
max_lon = -77

lal = np.arange(min_lat, max_lat, .5)
lol = np.arange(min_lon, max_lon, .5)

display = pyart.graph.RadarMapDisplay(my_radar)
fig = plt.figure(figsize = [10,8])
display.plot_ppi_map('reflectivity', sweep = 0, resolution = 'c',
                    vmin = -8, vmax = 64, mask_outside = False,
                    cmap = pyart.graph.cm.NWSRef,
                    min_lat = min_lat, min_lon = min_lon,
                    max_lat = max_lat, max_lon = max_lon,
                    lat_lines = lal, lon_lines = lol)


/Users/scollis/anaconda/envs/ams-workshop/lib/python3.5/site-packages/mpl_toolkits/basemap/__init__.py:3435: MatplotlibDeprecationWarning: The ishold function was deprecated in version 2.0.
  b = ax.ishold()
/Users/scollis/anaconda/envs/ams-workshop/lib/python3.5/site-packages/mpl_toolkits/basemap/__init__.py:3444: MatplotlibDeprecationWarning: axes.hold is deprecated.
    See the API Changes document (http://matplotlib.org/api/api_changes.html)
    for more details.
  ax.hold(b)

In [ ]: